First task:

do the exercise proposed in slide 59 (linear relationship with 45 deg rotation) and comment on the results.

Second task:

Option 1: synthetic sample

Prepare a program to generate samples of medium size (about 1000 objects) with three parameters (random variables) for each object, generated as follows

  • x gaussian distribution N(0,sigmaX)
  • y = ex+ gaussian noise N(0,sigmaErr)
  • z gaussian distribution N(0,sigmaZ)
* Generate three samples with the above program:
  1. small sigmaX and sigmaErr << sigmaX
  2. large sigmaX and sigmaErr << sigmaX
  3. small sigmaX and sigmaErr ~ sigmaX
* Do a first PCA with the samples and discuss the results
* Next, define a set of alternative variables (x',y',z') as a functional combination of the original
ones, aiming to make the new variables more suitable for the PCA analysis (that is, aiming to
have linear relationships between them, see notes).
* Generate new files from the original samples using the new variables
* Do a new PCA with the new samples. Compare the results with the first PCA ones and
discuss the improvement.

Option 2: use a data sample from your field of work

* Document the sample as indicated in the notes
* Do a PCA analysis and discuss the results
* Discuss if a functional combination of the original variables can improve the PCA results.
* If so, do it and discuss the results.

In both cases please include in the delivery the data file used, in weka format

Disponible des de: dimecres, 17 novembre 2010, 10:30
Data de venciment: dimecres, 1 desembre 2010, 10:30

Penja un fitxer (Mida màxima: 10Mb)